Skip to content

Hut101 19 jhwisdom#577

Open
jhwisdom wants to merge 2 commits intomaps-as-data:mainfrom
jhwisdom:hut101-19-jhwisdom
Open

Hut101 19 jhwisdom#577
jhwisdom wants to merge 2 commits intomaps-as-data:mainfrom
jhwisdom:hut101-19-jhwisdom

Conversation

@jhwisdom
Copy link

@jhwisdom jhwisdom commented Feb 4, 2026

Summary

This pull request adds support for Hugging Face models within the ClassifierContainer. Previously, users had to manually load Hugging Face models and feature extractors before passing them to the container. Now, by simply passing a Hugging Face repository path and setting the huggingface=True flag, the container handles the initialization automatically. (It is a part of the participation in the hut101 opportunities)

Fixes #192

Describe your changes

  • Updated ClassifierContainer.__init__: Added a huggingface boolean flag (defaulting to False).
  • Integrated transformers library:
    • Implemented conditional loading of models using AutoModelForImageClassification.from_pretrained.
    • Added ignore_mismatched_sizes=True to allow easy fine-tuning on custom labels.
  • Compatibility:
    • Set self.is_inception = False for HF models to bypass legacy Inception-specific logic while maintaining the existing _get_logits workflow.
    • Used getattr to dynamically set self.input_size from the processor's configuration, ensuring compatibility across different HF models.

Checklist before assigning a reviewer (update as needed)

  • Self-review code
  • Ensure submission passes current tests
  • Add tests (Tested manually with HF models)
  • Update relevant docs
  • Update changelog

Reviewer checklist

Please add anything you want reviewers to specifically focus/comment on.

  • Everything looks ok?

Copy link
Collaborator

@rwood-97 rwood-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jihye,

I've added a few small comments.

Could you also have a go at updating the docs, this is the file you'd need to edit: https://github.com/maps-as-data/MapReader/blob/main/docs/source/using-mapreader/step-by-step-guide/4-classify/train.rst

The unit tests are failing at the moment but this isn't your fault - it is a dependency issue I think so I will try fix these in a separate branch.
The only one you need to fix is the check changelog test, this basically checks you've updated the changelog CHANGELOG.md as part of your PR. To fix it you just need to update the CHANGELOG.md file with your changes.

Thanks for doing this :)

)

self.labels_map = labels_map
self.huggingface = huggingface
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would skip setting self.huggingface since its only referenced further down in the init and instead just use the huggingface value in the if statement below

num_labels=num_labels,
ignore_mismatched_sizes=True
).to(self.device)
self.hf_processor = AutoImageProcessor.from_pretrained(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could also be just hf_processor instead of an attribute self.hf_processor since it isn't used outside of this function

is_inception: bool = False,
load_path: str | None = None,
force_device: bool = False,
huggingface=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a type hint here

Suggested change
huggingface=False,
huggingface: bool = False,

ignore_mismatched_sizes=True
).to(self.device)
self.hf_processor = AutoImageProcessor.from_pretrained(model)
self.input_size = getattr(self.hf_processor, "size", {"height": 224})["height"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking here there seem to be 3 options for how size is defined.

Could you implement these, i.e.:

Suggested change
self.input_size = getattr(self.hf_processor, "size", {"height": 224})["height"]
size = getattr(hf_processor, "size", {})
if "height" in size and "width" in size:
self.input_size = (size["height"], size["width"])
elif "shortest_edge" in size:
self.input_size = (size["shortest_edge"], size["shortest_edge"])
else:
self.input_size = input_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set up a function/method which allows us to load/save HF models easily in the mapreader pipeline.

2 participants